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We present a simple, robust and efficient method for varying the parameters in a many-body 
wave function to optimize the expectation value of the energy. The effectiveness of the method is 
demonstrated by optimizing the parameters in flexible Jastrow factors, that include 3-body electron- 
electron- nucleus correlation terms, for the NO2 and decapentaene (C10H12) molecules. The basic 
idea is to add terms to the straightforward expression for the Hessian of the energy that have zero 
expectation value, but that cancel much of the statistical fluctuations for a finite Monte Carlo sample. 
The method is compared to what is currently the most popular method for optimizing many-body 
wave functions, namely minimization of the variance of the local energy. The most efficient wave 
function is obtained by optimizing a linear combination of the energy and the variance. 



Quantum Monte Carlo methods Q, 0, El are some 
of the most accurate and efficient methods for treating 
many body systems. The success of these methods is in 
large part due to the flexibility in the form of the trial 
wave functions that results from doing integrals by Monte 
Carlo. Since the capability to efficiently optimize the pa- 
rameters in trial wave functions is crucial to the success 
of both the variational Monte Carlo (VMC) and the dif- 
fusion Monte Carlo (DMC) methods, a lot of effort has 
been put into inventing better optimization methods. 

The variance minimization jj, |5( method has become 
the most frequently used method for optimizing many- 
body wave functions because it is far more efficient than 
straightforward energy minimization. The reason is that, 
for a sufficiently flexible variational wave function, it is 
possible to lower the energy on the finite set of Monte 
Carlo (MC) configurations on which the optimization 
is performed, while in fact raising the true expectation 
value of the energy. On the other hand, if the variance 
of the local energy is minimized, each term in the sum 
over MC configurations is bounded from below by zero 
and the problem is far less severe |5j ■ 

Nevertheless, in recent years several clever methods 
have been invented that optimize the ener gy r ather than 
the variance [E S IE S Efl, E3, E3, IIS 13 13 The mo- 
tivations for this are four fold. First, one typically seeks 
the lowest energy in either a variational or a diffusion 
Monte Carlo calculation, rather than the lowest variance. 
Second, although the variance minimization method has 
been used to optimize both the Jastrow coefficients and 
the determinantal coefficients (the coefficients in front of 
the determinants, and in the expansion of the orbitals 
in a basis, and the exponents in the Slater/Gaussian ba- 
sis functions) 0, Q, ^J, it takes many iterations to 
optimize the latter and the optimization can get stuck 
in multiple local minima. So, most authors have used 
variance minimization for the Jastrow parameters only, 
where these problems are absent. Third, for a given form 
of the trial wave function, energy-minimized wave func- 
tions on average yield more accurate values of other ex- 
pectation values than variance minimized wave functions 
do ^^|. Fourth, the Hellman-Feynman theorem, com- 



bined with a variance reduction technique [lflj . can be 
used with energy-minimized wave functions to compute 
forces on nuclei. 

The various energy minimization methods are suc- 
cessful in varying degrees. The generalized eigenvalue 
method of Nightingale and Melik-Alaverdian [2j is the 
most efficient choice for optimizing linear parameters, 
but for nonlinear parameters they use variance minimiza- 
tion. The effective fluctuation potential method [ToL HH 
H2I H3I ITij is the most successful method for nonlinear 
parameters and has been applied to optimizing the or- 
bitals [ill IT3 | and the linear coefficients in a multideter- 
minantal wave function |l2t ll^j , and, has been extended 
to excited states 01 ■ ^ is not straightforward to use this 
method to optimize Jastrow factors, but Prendergast, 
Bevan and Fahy have formulated a version for pe- 
riodic systems and have optimized an impressively large 
number of parameters. However, the method is complex 
and needs to be reformulated for finite systems. The 
stochastic reconfiguration method 0] is related to the 
effective fluctuation potential method and is simpler but 
less efficient The Newton method as implemented 

in Ref . is the most straightforward method but is ineffi- 
cient and unstable. The earlier methods 0, Q have been 
applied only to very small systems or very few parame- 
ters. 

The purpose of this letter is to show that it is possible 
to devise an energy minimization method that is simple, 
robust and efficient. The method can be applied to op- 
timizing many-body wave functions, for both continuum 
and lattice problems. The trick to doing this is to mod- 
ify the straightforward expression for the Hessian of the 
energy by adding a term that has zero expectation value 
for an infinite MC sample, but that is nonzero and can- 
cels much of the statistical fluctuations for a finite MC 
sample. Before we describe this in detail, we review the 
variance minimization method. 

Variance minimization: The parameters in a real- 
valued trial wave function tp are varied to minimize the 
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variance of the local energy, 
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where E^ — Hip/ip is the local energy, (•} denotes a ip 2 - 
weighted expectation value, and E = (-El) is the expec- 
tation value of the energy. The derivative of a 1 with 
respect to the i th parameter, Ci, is given by 
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where subscript i denotes derivative with respect to Cj. 

Since the variance minimization method can be viewed 
as a fit of the local energy on a fixed set of Monte Carlo 
configurations 0, an alternative expression follows from 
ignoring the change of the wave function: 

(a% = 2(E L , i {E L -E)) = 2({E Ij , i -E i )(E L -E)){3 l ) 

Then the usual Levenberg-Marquardt approxima- 
tion to the Hessian matrix is given by 

(a 2 ) ij =2((E hti -E i )(E htj -E j )) 

= 2 {(E^E^) - E t (E htj ) ~ Ej (E L<i ) + E t E^4) 

This Hessian is positive definite by construction. 
Energy minimization: The elements of the gradient are 
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= 2 



^(E L -E)^ (by Hermiticity). (6) 



We note that the step from Eq. [SJto Eq. [5]was made not 
just in the interest of simplicity, but more importantly 
because the expression in Eq. has zero fluctuations in 
the limit that tp is an exact eigenstate, whereas the ex- 
pression in Eq. Elhas large fluctuations. 

Taking the derivative of Eq. H3 the Hessian is 
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This is nothing more than a rearrangement of terms in 
the Hessian in Ref. We now make two changes to 
the above expression. First, we note that the last term is 
not symmetric in i and j when approximated by a finite 



sample, whereas the true Hessian of course is symmetric. 
So, we symmetrize it. This change does not significantly 
alter the efficiency of the method, but it does have the 
advantage that the eigensystem is real. Next, we note 
that Eq. Eland all except the last term in Eq.[7|are in the 
form of a covariance, ((ab) — (a)(b)). The fluctuations of 
(ab) — (a) (b) are in most cases smaller than those of (ab) , 
(e.g. if a and b are weakly correlated), and, they are much 
smaller if (a 2 ) — (a) 2 <C | (a) \ and a is not strongly cor- 
related with 1/6. Since the Hamiltonian is Hermitian it 
follows, as also noted in Ref. |8|, that (Ej,_j) = 0. Hence, 
an alternative symmetric expression |2jJ for the Hessian, 
written entirely in terms of covariances, is: 
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The additional terms we have added in have zero expec- 
tation value for an infinite sample but, in practice, cancel 
most of the fluctuations in the existing terms for a finite 
sample, making the method vastly more efficient. Note 
also that Eij in Eq. \7\ evaluated on a finite sample, is 
not invariant under renormalization of the wave function 
by a parameter-dependent constant but in Eq. [H] is . 

We note that Eqs.[S]and[S]are not the gradient and the 
Hessian of the energy estimated on the particular finite 
set of sampled points. In fact, any method that attempts 
to minimize the energy, by minimizing the energy evalu- 
ated on a finite sample of Monte Carlo points, is bound 
to require a very large sample and therefore be highly in- 
efficient for the reason discussed in the introduction. Our 
modifications of the straightforward expressions for the 
gradient and Hessian are similar in spirit to the work of 
Nightingale and Melik-Alaverdian 9]. A straightforward 
minimization of the energy on a Monte Carlo sample re- 
sults in a symmetric Hamiltonian matrix, but they derive 
a nonsymmetric Hamiltonian matrix that yields exact pa- 
rameters from a finite sample in the limit that the basis 
functions span an invariant subspace. 

Newton method: In both the energy and the variance 
minimization methods, the gradient, b, and the Hessian, 
A, are used to update the variational parameters, c, us- 
ing Newton's method, c ncxt = c currcnt - A _1 b. 

Note that if we are far away from the minimum, or 
if the number of Monte Carlo samples, Nmc, is small, 
then the Hessian of Eq. [S] need not be positive definite, 
whereas the approximate Hessian of Eq.^is always posi- 
tive definite. Further, even for positive definite Hessians, 
the new parameter values may make the wave function 
worse if one is not sufficiently close to the minimum for 
the quadratic approximation to hold or if the approxi- 
mate Hessian of Eq.0]is not sufficiently accurate. Hence, 
we determine the eigenvalues of the Hessian and add to 
the diagonal of the Hessian the negative of the most nega- 
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tive eigenvalue (if one exists) plus a constant adiag- This 
shifts the eigenvalues by the added constant. As adi ag 
is increased, the proposed parameter changes become 
smaller and rotate from the Newtonian direction to the 
steepest descent direction. As an aside, we note that for 
the form of the wave functions used and the molecules 
studied here, we find that the eigenvalues of the Hessians 
of Eqs. |H1 and 0] span 11 orders of magnitude when the 
parameters are close to optimal. 

Results: We have tested the methods on NO2 and the 
excited 1 B U state of decapentaene (C10H12) using a non- 
local pseudopotential to remove core electrons. We opti- 
mize the parameters in a flexible Jastrow factor 0] that 
contains electron-electron, electron-nucleus and electron- 
electron- nucleus terms, making a total of 43 free param- 
eters. The starting Jastrow is a crude electron-electron 
Jastrow of the form exp( j^), where b is set by the cusp 
conditions for antiparallel- and parallel-spin electrons. 
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FIG. 2: Same as Fig.^but for the rms fluctuations of the local 
energy, a, rather than the energy. The linear combination 
a is half way between those from energy minimization and 
variance minimization. 
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FIG. 1: Energy of NO2 versus iteration number for energy 
minimization. Inset: the later iterations on an expanded scale 
and also the energies from minimizing the variance and mini- 
mizing the linear combination. The linear combination yields 
almost as good an energy as energy minimization. 

In Fig. n we plot the energy, and, in Fig. the root 
mean square fluctuations of the local energy, a, of NO2 
as a function of the iteration number as we energy opti- 
mize the 43 free parameters in the Jastrow. The first 6 
iterations employ a very small MC sample, Nmc = 1000, 
and adiag = 0.2. For each of the next 6 iterations we in- 
crease N MC by a multiplicative factor of 4 and decrease 
adiag by a multiplicative factor of 0.1. The remaining 11 
iterations are performed with the values at the end of this 
process, namely, Nmc = 4, 096, 000, and adi ag = 2x 10~ 7 . 
(Setting adiag = would work equally well for these it- 
erations.) The first few iterations are extremely fast due 
to the small value of Nmc and achieve most of the opti- 
mization. In the insets we show the later iterations on an 
expanded scale, and also the energies and a from mini- 
mizing the variance (using Eqs.|2]and^) and from mini- 
mizing a linear combination, with the variance having a 
weight of 0.05 and the energy a weight of 0.95. Of course, 



Iteration number 

FIG. 3: The autocorrelation time, T cor r, of NO2 versus iter- 
ation number. Energy minimization gives the smallest T CO rr, 
variance minimization the largest, and, the mixed minimiza- 
tion a value that is close to that from energy minimization. 

the variance-minimized wave functions have a lower a 
and the energy-minimized wave functions a lower energy. 
The mixed-minimization wave functions have an energy 
that is almost as good as that of the energy-minimized 
wave functions, and, a a that is in between. 

The computational time required to reduce the sta- 
tistical error to a given value is proportional to c 2 T corr , 
where T colT is the autocorrelation time of the energy as 
defined in Ref. j^j ■ One can argue that in DMC the en- 
ergy minimized wave functions will have a smaller T corr 
than variance minimized wave functions, since both a 
and T corr serve to lower the DMC energy relative to the 
variational energy. In Fig. [3J we show T corr for each of 
the three methods. We see that the energy minimized 
wave function has a smaller value of T corr than the vari- 
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FIG. 4: Same as Fig.Qbut for decapentaene (C10H12). 



ance minimized wave function, even in VMC. The mixed 
minimization wave function has a T corr that is close to 
that of the energy minimized wave function. The value 
of <7 2 T corr for the variance, energy and mixed optimiza- 
tions is 1.08, 1.03 and 0.98 H in VMC, and, 3.21, 2.87 
and 2.75 H in DMC using a time-step of 0.05 H — 1 , where 
the last digit in a 2 T corl is uncertain. Hence, the wave 
functions obtained from the mixed optimization are the 
most efficient ones. 

We note that E and a are fully converged in 12 it- 
erations. In fact, it is possible to converge them in 4-5 
iterations if we use from the outset a larger value for Nmc 
and reduce the value for adiag more rapidly. However, it 
is more computationally efficient to start the optimiza- 
tion by performing several iterations with a small Nmc- 

In Fig.0]we plot the energy of the excited 1 B U state of a 
larger molecule, decapentaene (G10H12), as a function of 
iteration number. For the first 6 iterations we optimize 
just the 13 parameters in the electron-nucleus and the 
electron-electron Jastrows, and, optimize the full set of 
43 parameters starting from iteration 7. As in the case 
of NO2, we employ Amc = 1000 and adiag = 0.2 during 
the first six iterations. The next six are performed with 
^Vmc = 16000 and the final 11 iterations are performed 
with Nmc = 256000 and a diag = 2 x 10" 5 . The results 
are similar to those for NO2, and so in the interest of 
brevity we omit plots for a and T corr . 

It is remarkable that most of the optimization can be 
done with as few as 1000 MC configurations. In contrast, 
if Eq. |7| is used for the Hessian, then the fluctuations are 
much larger and the method becomes unstable for the 
molecules treated here even if we increase the number 
of Monte Carlo configurations, Nmc, by a factor of a 
thousand to 10 6 configurations. (We can make it stable 
by increasing substantially also the value of adiag, but this 
increases the number of iterations needed to converge.) 
Hence, the simple change going from Eq.[7|to Eq.0 that 
entails no additional computational cost, results in a gain 
in efficiency of at least three orders of magnitude. 
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